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Abstract 

Principal Component Analysis (PCA) is the most widely used tool for linear dimensionality reduction and clustering. Still it is 
highly sensitive to outliers and does not scale well with respect to the number of data samples. Robust PCA solves the first issue 
with a sparse penalty term. The second issue can be handled with the matrix factorization model, which is however non-convex. 
Besides, PCA based clustering can also be enhanced by using a graph of data similarity. In this article, we introduce a new model 
called ‘Robust PCA on Graphs’ which incorporates spectral graph regularization into the Robust PCA framework. Our proposed 
model benefits from 1) the robustness of principal components to occlusions and missing values, 2) enhanced low-rank recovery, 
3) improved clustering property due to the graph smoothness assumption on the low-rank matrix, and 4) convexity of the resulting 
optimization problem. Extensive experiments on 8 benchmark, 3 video and 2 artificial datasets with corruptions clearly reveal that 
our model outperforms 10 other state-of-the-art models in its clustering and low-rank recovery tasks. 
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Fig. 1: Main idea of our work. 


1. Introduction 

What is PCA? Given a data matrix X G with n p-dimensional data vectors, the classical PCA can be formulated as 

learning the projection Q G of X in a (i-dimensional linear space characterized by an orthonormal basis U G (1^^ 

model in Table I). Traditionally, U and Q are termed as principal directions and principal components and the product f/Q is 
known as the low-rank approximation L e of X. 

Applications of PCA: PCA has been widely used for two important applications. 

1) Dimensionality reduction or low-rank recovery. 

2) Data clustering using the principal components Q in the low dimensional space. 

However, classical PCA formulation is susceptible to errors in the data because of the quadratic term. Thus, an outlier in 
the data might result in erratic principal components Q which can effect both the dimensionality reduction and clustering. 

Robust dimensionality reduction using PCA: Candes et. al. [4] demonstrated that PCA can be made robust to outliers by 
exactly recovering the low-rank representation L even from grossly corrupted data X by solving a simple convex problem, 
named Robust PCA (RPCA, 2^^ model in Table I). In this model, the data corruptions are represented by the sparse matrix 
S G Extensions of this model for inexact recovery [20] and outlier pursuit [17] have been proposed. 

The clustering quality of PCA can be significantly improved by incorporating the data manifold information in the form 
of a graph G [7], [19], [8], [15]. The underlying assumption is that the low-dimensional embedding of data X lies on a smooth 
manifold [1]. Let G = (V^E^A) be the graph with vertex set V as the samples of X, E the set of pairwise edges between 
the vertices V and A the adjacency matrix which encodes the weights of the edges E. Then, the normalized graph Laplacian 
matrix ^ G which characterizes the graph G is defined as ^ — A)D~^^‘^ = I — , where D is 

the degree matrix defined as D = diag((ii) and di = Aij. Assuming that a p-nearest neighbors graph is available, several 
methods to construct A have been proposed in the literature. The three major weighting methods are: 1) binary, 2) heat kernel 
and 3) correlation distance [8]. 

In this paper we propose a novel convex method to improve the clustering and low-rank recovery applications of PCA 
by incorporating spectral graph regularization to the Robust PCA framework. Extensive experiments reveal that our proposed 
model is more robust to occlusions and missing values as compared to 10 state-of-the-art dimensionality reduction models. 
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TABLE I: A comparison of various PC A models and their properties. X is the data matrix, U G and Q G 

are the principal directions and principal components in a d dimensional linear space (rank = d). L = UQ G is the 

low-rank representation and S ^ is the sparse matrix. and G characterize a simple graph or a hypergraph 

G between the samples of X. || • jji?, || • ||* and || • ||i denote the Frobenius, nuclear and h norms respectively. 



Model 

Objective 

Constraints 

Parameters 

Graph? 

Factors? 

Convex? 

1 

PCA 

miny.Q ||X - UQWi, 

U'^U = I 

d 

no 

yes 

no 

2 

RPCA [4] 

II^IU + Alls'll! 

X = L + S 

A 

no 

no 

yes 

3 

PROPOSED 

miriL^S l|L|U +A||S||i + 7 tr(L^L'-^') 

X = L + S 

A, 7 

YES 

NO 

YES 

4 

GLPCA [ ] 

min[ 7 ,Q IIX - UQWj^ + 7 tr(QT>Q'^') 

<0 

<0 

II 

d, 7 

yes 

yes 

no 

5 

RGLPCA [ ] 

||X - UQ\\2X + 7 tr(QT>Q^ ) 

6 

MMF [19] 

min[7,Q IIX - UQWf + 7 tr(QT>Q^‘) 

U^U = I 

7 

MMMF [15] 

mine/,Q,a - UQWj, + ag^!^)Q^ ) + ^||a||^ 

U^U = I 

= 1 


8 

MHMF [ 8 ] 

min( 7 ,Q,o, ||X - UQWj, + ) 


II. Main Idea & Proposed Model 

The main contributions of our work are: 

1) Exact recovery of the low-rank representation L from grossly corrupted data X. 

2) Recovery of the low-rank representation L that also reveals the underlying class separation. 

3) High cluster purity in the low dimensional space even when no clean data is available. 

4) A simple convex optimization problem with minimal parameters to achieve these objectives. 

The figure on page 1 illustrates the main idea of our work. 

Without any doubt the contributions 1 and 4 are given by the work of Candes et. al. [4], namely Robust PCA (RPCA, 2^^ 
model in Table I). Thus, as a first step, we propose that instead of utilizing a classical PCA-like model so as to explicitly learn 
principal directions U and principal components Q of data X, one can directly recover the low-rank matrix L itself (L = UQ). 

Secondly and more importantly, to achieve contributions 2 and 3 we propose: The low-rank matrix L itself lies on a smooth 
manifold and it can be recovered directly on this manifold. Our proposed model is as follows: 

min ||L||* +Alls'll! + 7 tr(L$L^), (1) 

s.t. X = L + S, 

where the sparse errors in the data are modeled by S and L is the low-rank approximation of X. Parameters A and 7 control 
the amount of sparsity of S and smoothness of L on the graph respectively. We will define our graph in Section VI-C. 

Generalization of Robust PCA: We call our proposed model (1) Robust PCA on Graphs. It is a direct extension of the 
Robust PCA proposed by Candes et. al. [4] with smoothness manifold regularization. That is, setting 7 = 0 in our model (1) 
we obtain the standard model of [4]. 

III. Related Works: Factorized PCA Models 

Both manifold regularization and robustness techniques for PCA have been proposed in the literature, either separately or 
combined [7], [19], [8], [15]. Following the classical PCA model they explicitly learn two factors U and Q such that X ^ UQ. 
We will, therefore refer to these models as factorized PCA models. Furthermore, unlike our model, some of these works [7] 
assume the graph smoothness of principal components Q (instead of L = UQ). 

Jiang et. al. proposed Graph Laplacian PCA (GLPCA) [7] (4^^ model in Table I) which leverages the graph regularization 
of principal components Q using the term ir{Q^Q^) for clustering in the low dimensional space. They also proposed a robust 
version of their model (5^^ model in Table I) and demonstrated the robustness of principal components Q to occlusions in the 
data. 

Zhang and Zhao [19] proposed Manifold Regularized Matrix Factorization (MMF, 6^^ model in Table I) which exploits the 
orthonormality constraint on the principal directions U (contrary to [7]) to acquire a unique low-rank matrix L = UQ for any 
optimal pair U, Q. In this case we have ir{Q^Q^) = ir{UQ^{UQ)'^) = tr(L4>L^), therefore this model implicitly assumes 
the graph smoothness of L. The extensions of this model with an ensemble of graph and hypergraph regularization terms have 
been proposed by Tao et. al. [15] and Jin et. al. [8] respectively (7^^ and 8^^ models in Table I). 

Shortcomings of state-of-the-art: Although the models proposed in [7], [19], [8], [15] leverage the graph to learn enhanced 
class structures, they still suffer from numerous problems. Most of these models are not robust to data corruptions [19], [8], 
[15]. Those which leverage the robustness suffer from non-convexity [7]. An ensemble of graphs or hypergraphs leverages the 
non-linearity of data in an effective manner [8], [15]. However, it makes the models non-convex and the resulting alternating 
direction methods can get stuck in local minima. 

Notation & Terminology: Throughout this article || • H^, || • ||* and || • ||i denote the Frobenius, nuclear and h norms 
respectively. We will refer to the regularization term tr{Q^Q^) as principal components graph regularization. Note that the 
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graph regularization involves principal components Q, not principal directions U. We will also refer to the regularization term 
tr(l/<hl/^) as low-rank graph regularization. RPCA and our proposed models {2^^ and 3^^^ models in Table I) which perform 
exact low-rank recovery by splitting X = L ^ S will be referred to as non-factorized PCA models. A comparison of various 
PCA models introduced so far is presented in Table 1. Note that only RPCA and our proposed model leverage convexity and 
enjoy a unique global optimum with guaranteed convergence. 

IV. Comparison with Related Works 

The main differences between our model (1) and the various state-of-the-art factorized PCA models [ ], [19], [12], [ 8 ], [15] 
are, as summarized in Table I, the following. 

Non-factorized model: Instead of explicitly learning the principal directions U and principal components Q, it learns their 
product, i.e. the low-rank matrix L. Hence, (1) is a non-factorized PCA model. 

Exact low-rank recovery: Unlike factorized models we target the exact low-rank recovery by modeling the data matrix as 
the sum of low-rank L and a sparse matrix S. 

Different graph regularization term: Model (1) is based on the assumption that it is the low-rank matrix L that is smooth 
on the graph, and not just the principal components matrix Q. Therefore we replace the principal components graph term 
tr((5^Q^) with the low-rank graph term tr(I/$I/^). Note that as explained in Section III, the two terms are only equivalent 
if orthogonality of U is further assumed, as in [19], [ 12 ], [ 8 ], [15] and not in [7]. 

A. Advantages over Factorized PCA Models 

Robustness to gross corruptions for clustering & low-rank recovery: The low-rank graph ir{L^L^) can be more realistic 
than the principal components graph It allows the principal directions U to benefit from the graph regularization 

as well (recall that L = UQ). Thus, our model enjoys an enhanced low-rank recovery and class separation even from grossly 
corrupted data. For details, please refer to Sections VII & VIII (also see Fig. 13 in supplementary material). 

Convexity: It is a strictly convex problem and a unique global optimum can be obtained by using standard methods like an 
Alternating Direction Method of Multipliers (ADMM) [2] . 

One model parameter only: Our model does not require the rank of L to be specified up-front. The nuclear norm relaxation 
enables the automatic selection of an appropriate rank based on the parameters A and 7. Furthermore, as illustrated in our 
experiments, the value A = 1/proposed in [4] gives very good results. As a result, the only unknown parameter 
to be selected is 7, and for this we can use methods such as cross validation (For additional details please see Figs. 16 & 17 
in the supplementary material). 


V. Optimization Solution 
We use an ADMM [2] to rewrite Problem (1): 

xmn\\L\U + X\\S\\,+^tviW^W^) 
s.t. X = L + S, L = W. 


Thus, the augmented Lagrangian and iterative scheme are: 


(L, S, W)’^+^ = argmin ||L||* + Alls'll! + 

L,S,W 

+ {Z^,X-L-S) +L||x-L-S|||. 

+ {Zl,W-L) + '^\\W-L\\% 

7+1 = z^ +ri{X - - S*+1), 

= Z^ + 


where Zi e and Z 2 G R^^" are the lagrange multipliers and k is the iteration index. Let = X — + Z^jri and 


Hi = 


Zllr 2 , then this reduces to the following updates for L, S and W as: 


= 


prox 


(ri+r'2) I 


'nHl + r2Hl 
- n + r2 


). 


S''+i = prox^iisii^ (X - 

=r2(7^ + r2/)“^(L'=+^ - 


where proxj is the proximity operator of the convex function / as defined in [5]. The details of this solution, algorithm, 
convergence and computational complexity are given in the supplementary material Sections A, B & C. 
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VI. Experimental Setup 

We use the model (1) to solve two major PCA-based problems. 

1) Data clustering in low-dimensional space with corrupted and uncorrupted data (Section VII ). 

2) Low-rank recovery from corrupted data (Section VIII). 

Our extensive experimental setup is designed to test the robustness and generalization capability of our model to a wide 
variety of datasets and corruptions for the above two applications. Precisely, we perform our experiments on 8 benchmark, 
3 video and 2 artificial datasets with 10 different types of corruptions and compare with 10 state-of-the-art dimensionality 
reduction models as explained in sections VI-A & VI-B. 

A. Setup for Clustering 

1) Datasets: All the datasets are well-known benchmarks. The 6 image databases include CMU PIE^, ORL^, YALE^ 
COIL20'^, MNIST^ and USPS data sets. MFeat database ^ consists of features extracted from handwritten numerals and the 
BCI database^ comprises of features extracted from a Brain Computer Interface setup. Our choice of datasets is based on 
their various properties such as pose changes, rotation (for digits), data type and non-negativity, as presented in Table II in the 
supplementary material. 

2) Comparison with 10 models: We compare the clustering performance of our model with k-means on the original data 
X and 9 other dimensionality reduction models. These models can be divided into two categories. 

1. Models without graph: 1) classical Principal Component Analysis (PCA) 2) Non-negative Matrix Eactorization (NME) 
[10] and 4) Robust PC A (RPC A) [4]. 

2. Models with graph: These models can be further divided into two categories based on the graph type. a. Principal 
components graph: 1) Normalized Cuts (NCuts) [14], 2) Laplacian Eigenmaps (LE) [1], 3) Graph Laplacian PCA (GLPCA) 
[7], 4) Robust Graph Laplacian PCA (RGLPCA) [7], 5) Manifold Regularized Matrix Eactorization (MME) [19], 6) Graph 
Regularized Non-negative Matrix Eactorization (GNME) [3], b. Low-rank graph: Our proposed model. 

Table II and Eig. 9 in the supplementary material give a summary of all the models. 

3) Corruptions in datasets: Corruptions in image databases: We introduce three types of corruptions in each of the 6 
image databases: 

1. No corruptions. 

2. Fully corrupted data. Two types of corruptions are introduced in all the images of each database: a) Block occlusions 
ranging from 10 to 40% of the image size, b) Missing pixels ranging from 10% to 40% of the total pixels in the image. These 
corruptions are modeled by placing zeros uniformly randomly in the images. 

3. Sample specific corruptions. The above two types of corruptions (occlusions and missing pixels) are introduced in only 
25% of the images of each database. 

Corruptions in non-image databases: We introduce only full and sample specific missing values in the non-image databases 
because the block occlusions in non-image databases correspond to an unrealistic assumption. Example missing pixels and 
block occlusions in the image are shown in Eig. 10 in the supplementary material. 

Pre-processing: Eor NCuts, LE, PCA, GLPCA, RGLPCA, MME, RPCA and our model we pre-process the datasets to zero 
mean and unit standard deviation along the features. Additionally for MME all the samples are made unit norm as suggested 
in [19]. Eor NME and GNME we only pre-process the non-negative datasets to unit norm along the samples. We perform 
pre-processing after introducing the corruptions. 

4) Clustering Metric: We use clustering error to evaluate the clustering performance of all the models considered in this 
work. The clustering error is = (^ X]r=i ^ where is the number of misclassified samples in cluster r. We 
report the minimum clustering error from 10 runs of k-means (k = number of classes) on the principal components Q. This 
procedure reduces the bias introduced by the non-convex nature of k-means. Eor RPCA and our model we obtain principal 
components Q via SVD of the low-rank matrix L = UT^Q during the nuclear proximal update in every iteration. Eor more 
details of the clustering error evaluation and parameter selection scheme for each of the models, please refer to Eig. 11 and 
Table III of the supplementary material. 

^http://vasc.ri.cmu.edu/idb/html/face/ 

^http://www.cLcam.ac.uk/research/dtg/attarchive/facedatabase.htnil 
^http://vision.ucsd.edu/content/yale-face-database 
^http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php 
^http://yann.lecun.coni/exdb/mnist/ 

^https://archive.ics.uci.edu/ml/datasets/Multiple+Eeatures 
^ http: // Olivier, chapelle. cc/s sl-book/benchmarks .html 
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B. Setup for Low-Rank Recovery 

Since the low-rank ground truth for the 8 benchmark datasets used for clustering is not available, we perform the following 
two types of experiments. 

1) Quantitative evaluation of the normalized low-rank reconstruction error using corrupted artificial datasets. 

2) Visualization of the recovered low-rank representations for 1) occluded images of the CMU PIE dataset and 2) static 
background of 3 different videos^. 


C. Normalized Graph Laplacian 


In order to construct the graph Laplacian the pairwise Euclidean distance is computed between each pair of the vectorized 
data samples (xi^Xj). Let Ll be the matrix which contains all the pairwise distances, then Llij, the Euclidean distance between 
Xi and Xj is given as: 

Case I: Block Occlusions 


— 


I \\Mij o {xj Xj 


2 

2 


V 11^^ 


ij\\l 


where Mij G {0,1}^ is the vector mask corresponding to the intersection of uncorrupted values in Xi and Xj. Thus 

_ f 1 if features x\ & x^- observed, 

I 0 otherwise 

Thus, we detect the block occlusions and consider only the observed pixels. 

Case II: Random Missing Values: We use a Total Variation denoising procedure and calculate using the cleaned 
images. 

Let uJmin be the minimum of all the pairwise distances in ft. Then the adjacency matrix A for the graph G is constructed 
by using 

Aij = exp ( - 

Einally, the normalized graph Laplacian ^ = / — is calculated, where D is the degree matrix. Note that 

different types of data might call for different distance metrics and values of cr^, however for all the experiments and datasets 
used in this work the Euclidean distance metric and = 0.05 work well. To clarify further for the reader, we present a 
detailed example of graph construction with corrupted and uncorrupted images in Lig. 12 of supplementary material. 


VII. Clustering in Low Dimensional Space 

Lig. 2 presents experimental results for the first important application of our model, i.e. clustering. Lig. 3 illustrates the 
principal components Q for three classes of occluded CMU PIE data set in 3-dimensional space. Our model outperforms others 
in most of the cases with different types and levels of occlusions (please refer to Tables IV, V & VI in the supplementary 
material for additional results). In the next few paragraphs we elaborate further on 1) the advantage of graph over non-graph 
models, 2) the advantage of low-rank graph over principal components graph. Throughout the description of our results, the 
comparison with RPCA is of specific interest because it is a special case of our proposed model. 


A. Is the Graph Useful? 

Our model performs better than k-means, standard PC A, NML and RPCA which do not leverage the graph. 

Example case I: CMU PIE database with no pose variation: Consider the case of CMU PIE dataset in Lig. 2. This dataset 
does not suffer from pose changes and we observe that our model attains as low as 0% error when there are no corruptions in 
the data. Lurthermore, we attain lowest error even with the increasing levels of occlusions and missing pixels. This can also 
be observed visually from Lig. 3 where the principal components for our model are better separated than others. 

Example case II: COIL20 database with pose variation: Our model outperforms RPCA and other non-graph models also 
for the COIL20 database which suffers from significant pose changes. Thus, even a graph constructed using the simple scheme 
of Section VI-C enhances the robustness of our model to gross data corruptions and pose changes. Similar conclusions can be 
drawn for all the other databases as well. 

^https://sites.google.com/site/backgroundsubtraction/test-sequences 





6 


Q 

C/3 

No 

Sample specific 
corruptions 

Corruptions on the whole data set 

Q 

Corruptions 

Occlusions Missing pixels 
(25% of (25% of 

Occlusions (% of image size) 

Missing values (% of image size) 



image size) image size) 150 ^ 

^0 25% 40% 

10 % 20 % 30 

% 4( 

0% 

CMU PIE 

80 

60 1 

iiiy.h 

im 

ihlfa 

ih 

ih 

i 

ORL 

80 - 

60 

40 

Ilk hilt h 

Lii 

iluh 

ii 

■ki 

li 

YALE 

80 - 

nil 

liM 

iiii 

in 

II 

1 

CO1L20 

60 

40 - L 

:lki 

hi hh 1 

Hk-meajis HNCuts [HI PCA H 

ik 14 hill 

]GLPC A IZlRGLPC A IZlMMF ■ 

iki 

INMF HGNMF MRPi 

Ihl 

□AHOut mcK 

j 

del 

li 


Fig. 2: Clustering error of various dimensionality reduction models for CMU PIE, ORL, YALE & COIL20 datasets. The 
compared models are: 1) k-means without dimensionality reduction 2) Normalized Cuts (NCuts) [14] 3) Laplacian Eigenmaps 
(LE) [1] 4) classical Principal Component Analysis (PC A) 5) Graph Laplacian PC A (GLPCA) [7] 6) Robust Graph Laplacian 
PC A (RGLPCA) [7] 7) Manifold Regularized Matrix Eactorization (MME) [19] 8) Non-negative Matrix Eactorization [10] 
9) Graph Regularized Non-negative Matrix Eactorization (GNME) [3], 10) Robust PCA (RPCA) [4] and 11) Robust PCA on 
Graphs (proposed model). Two types of full and partial corruptions are introduced in the data: 1) Block occlusions and 2) 
Random missing values. The numerical errors corresponding to this figure along with additional results on MNIST, USPS, 
MEeat and BCI datasets are presented in Tables IV, V & VI of the supplementary material. 
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Eig. 3: Principal components Q of three classes of CMU PIE data set in 3 dimensional space. Only ten instances of each class 
are used with block occlusions occupying 20% of the entire image. Our proposed model (on the lower right comer) gives 
well-separated principal components without any clustering error. 
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B. Low-Rank or Principal Components Graph? 

We compare the performance of our model with NCuts, LE, GLPCA, RGLPCA, MMF & GNMF which use principal 
components graph. It is obvious from Fig. 2 that our model outperforms the others even for the datasets with pose changes. 
Similar conclusions can be drawn for all the other databases with corruptions and by visually comparing the principal 
components of these models in Fig. 3 as well. 

Unlike factorized models, the principal directions U in the low-rank graph tr(I/$I/^) benefit from the graph regularization 
as well and show robustness to corruptions. This leads to a better clustering in the low-dimensional space even when the graph 
is constructed from corrupted data (please refer to Fig. 13 of the supplementary material for further experimental explanation). 

C. Robustness to Graph Quality 

We perform an experiment on the YAFE dataset with 35% random missing pixels. Fig. 4 shows the variation in clustering 
error of different PCA models by using a graph constructed with decreasing information about the mask. Our model still 
outperforms others even though the quality of graph deteriorates with corrupted data. It is essential to note that in the worst 
case scenario when the mask is not known at all, our model performs equal to RPCA but still better than those which use a 
principal components graph. 



BgLPCA BrGLPCA OmMF HgNMF BrPCA BOur model 


Fig. 4: Effect of lack of information about the corruptions mask on the clustering error. YAFE dataset with 35% random 
missing pixels is used for this experiment. Our model (last bar in each group) outperforms the others. 


VIIL Fow-Rank Recovery 

A. Low-Rank Recovery from Artificial Datasets 

To perform a quantitative comparison of exact low-rank recovery between RPCA and our model, we perform experiments 
on 2 artificial datasets as in [4]. Note that only RPCA and our model perform exact low-rank recovery so we do not present 
results for other PCA models. We generate low-rank square matrices L = A^B G (rank d = 0.02n to 0.3n) with A and 

B independently chosen dx n matrices with i.i.d. Gaussian entries of zero mean and variance 1/n. We introduce 6% to 30% 
errors p = ||5'||o/n^ in these matrices from an i.i.d. Bernoulli distribution for support of S with two sign schemes. 

1) Random signs: Each corrupted entry takes a value ±1 with a probability p/2. 

2) Coherent signs: The sign for each corrupted entry is coherent with the sign of the corresponding entry in L. 

Fig. 5 compares the variation of log normalized low-rank reconstruction error log(||I/ — I/||i7/||I/||i?) with rank (x-axis) and 
error (y-axis) between RPCA (b and d) and our model (a and c). The larger and darker the blue region, the better is the 
reconstruction. The first two plots correspond to the random sign scheme and the next ones to coherent sign scheme. It can 
be seen that for each (rank,error) pair the reconstruction error for our model is less than that for RPCA. Hence, the low-rank 
graph helps in an enhanced low-rank recovery. 

B. Low-Rank Recovery from Corrupted Faces 

We use the PCA models to recover the clean low-rank representations from a set of occluded images of CMU PIE dataset 
as shown in Fig. 6. None of the factorized models using principal components graph is able to perfectly separate the block 
occlusion from the actual image. This is not surprising because these models (except RGFPCA) are not robust to gross errors. 
Our model is able to separate the occlusion better than all the models. Even though it inherits the robustness from RPCA, 
we observe that the robust recovery of the low-rank representation L is greatly enhanced by using the low-rank graph. Please 
refer to Fig. 14 in the supplementary material for more results. 
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(b) RPCA, random signs 
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Fig. 5: Variation of Log normalized low-rank reconstruction error on artificial data (Section VIII- A) with (rank, error). The 
larger and darker the blue area, the lower reconstruction error of the model. 


original whitened (occluded, un-occluded) GLPCA RGLPCA MMF RPCA Proposed 



Fig. 6: Clean low-rank recovery of one image of the CMU PIE data set corresponding to each of the PC A models. The images 
of one person are corrupted by 10% block occlusions. figure shows the actual occluded image, 2^^ and 3^^^ figures show 
the whitened occluded and un-occluded images. Since PC A requires whitening, the recovered low-rank images in figures 4 to 
8 should resemble the un-occluded whitened image. 



Fig. 7: One frame of the recovered background from a video of a shopping mall lobby, left) actual frame, middle) recovered 
low-rank background using Robust PCA and right) recovered background using our proposed model. The presence of graph 
in our model enables better recovery and removes all the walking people from the frame. 


C. Low-Rank Background Extraction from Videos 

Static background separation from the dynamic foreground is an interesting application of low-rank recovery. We use 3 
videos ^ (1000 frames each) to recover the static low-rank background. The graph Laplacian 4> is constructed between the 
frames of videos without utilizing any prior information about sparse errors. The low-rank ground truth is not available for 
these videos so we present a visual comparison between RPCA and our model for one of the frames in Fig. 7. The RPCA 
model (middle) is unable to completely remove the person in the middle of the frame and his shadow from the low-rank 
frame. However, the presence of graph in our model (right) helps in a better recovery of the static background, which in this 
case is the empty shopping mall lobby. The pictures are best viewed on a computer screen on the electronic version of this 
article. Complete videos for our model can be found here ^ For more results please refer to Fig. 15 of supplementary 
material. 


IX. PARAMETER SELECTION SCHEME EOR OUR MODEL 

There are two parameters for our model: 1) Sparsity penalty A and 2) Graph regularization penalty 7. The parameter A can 
be set approximately equal to 1/^max(n,p) where n is the number of data samples and p is the dimension. This simple rule, 

^https://vid.me/GN0X 
^^https://vid.me/vR6d 
^ ^ https ://vid.me/RDgN 
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as suggested by Candes et. al. [4] works reasonably well for our clustering & low-rank recovery error experiments. After fixing 
A, the parameter 7 can be easily selected by cross-validation. The minimum clustering error always occurs around this value of 
A, irrespective of the size, number of classes and % of corruptions in the datasets. Due to lack of space we present the detailed 
results on the variation of clustering & low-rank recovery error over the (A, 7) grid in Figs. 16 & 17 of the supplementary 
material. 


X. Comparison of Computation Times 

We corrupt different sizes of the CMU PIE dataset (n = 300, 600 and 1200) with 20% occlusions and compute the time 
for one run of each model which gives the minimum clustering error (Fig. 8). Clearly, RGLPCA has the highest computation 
time, followed by our proposed model. However, the trade-off between the clustering error and computational time is worth 
observing from Figs. 2 and 8 (more details in the supplementary material Table VII). The large computation time of our model 
is dominated by the expensive SVD step in every iteration. Our future work will be dedicated to reduce this cost by using 
randomized algorithms for SVD [16] and exploiting the parallel processing capabilities [13]. 

XL Conclusion 


1,000 



n = 300 


n = 600 


Fig. 8: Computation time of different models for CMU PIE dataset with 20% occlusions. 


We present ‘Robust PC A on Graphs’, a generalization of the Robust PC A framework which leverages spectral graph 
regularization on the low-rank representation. The proposed model targets exact low-rank recovery and enhanced clustering in 
the low-dimensional space from grossly corrupted datasets by solving a convex optimization problem with minimal parameters. 
Numerical experiments on several benchmark datasets reveal that clustering using our low-rank graph regularization scheme 
outperforms various other state-of-the-art factorized PCA based models which use principal components graph regularization. 
Experiments for exact low-rank recovery from artificial datasets and static background separation from videos demonstrate its 
ability of to perform an enhanced low-rank recovery. On the top of that, it is also robust w.r.t. the graph construction strategy 
and performs well even when the graph is constructed from corrupted data. 

Note: Due to space constraints we present only a few important results in the main text. However, we encourage the reader 
to see the supplementary material for additional results. 
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Appendix 


A. Details of ADMM Solution 

We use the Alternating Direction Method of Multipliers (ADMM) [2]. First, the problem (1) is split as follows: 


mm\\L\U + X\\Sh+^tT{W^W^) 
s.t. X = L + S, L = W. 


Second, the augmented Lagrangian and iterative scheme are introduced: 

{L^+\S^+\W'^+^) = argmin ||L||* + A||5||i + 7 tr(W"i>W"^) + {Z^X - L - S) + ^||X - L - Sfp 

L,S,W 2 

+ {Zl,W-L) + '^\\W-Lfp, ( 2 ) 

zf+i = +ri{X- - 5'=+^), (3) 

=Z^+ r 2 iW'^+^ - (4) 


where Zi and Z 2 are the lagrange multipliers (dual variables) and k is the iteration index. 

In order to update the primal variables L, S and W we consider the definition of proximity operator [5]. Let F G ro(M^), 
where ro(M^) is the class of lower semi-continuous convex functions from to ] — oc, + 00 ] such that domain of F 7 ^ 0 . 
Then, for every x G M^, the following minimization problem admits a unique solution which is known as the proximity 
operator of F denoted by prox^. 

proxp(a;) = argminF(y) + -||a; - y\\l. 


Using this definition, the updates for L, S and W at {kFiy^ iteration using the previous iterates 
be made using the proximity operators. 

Update L 

Keeping only the terms with L in Eq. (2) we get 


= argmin ||L||* + ^ L - (X - F ^ 
L 2 ' ri 


2 

k 2 


= argmin ||i:|U + {Z'[,X - L - S'^) + 7-||X - L - + {Z^W'^ - L) + " Ff 


2 


7 k 2 

L-{W'^ + 

^ r 2 ^ 


= argmin \\L\\ 

L 




ri + r 2 


L- 


riFf + raffi 


ri + r2 


'nH’l + rzifl 

^ ri + r 2 


). 


where = X — F Z\/r\ and F Z^jr 2 . Let denote the element-wise soft-thresholding 

operator ^^-(x) = sgn(x) max(|x| — r, 0), then we can define Dr (A) = PVtr{^)Q^ as the singular value thresholding operator 
for matrix A, where A = PTjQ^ is any singular value decomposition of A. Let A = ^ ~ + ^2)/2 then 


pk+i = Di{A) = PUi (S)Q^. 

r r 


(5) 


Update S 

Following a similar procedure, we can write the update for S as. 


\ ri / 

Update W 

By keeping only the terms with W in eq. (2) we get 


iyfe+i = argmin 7 tr(W"$W"^) + ^ 
w 2 


W 




which is a smooth function in W, so we can use the optimality condition to find a closed 

w’^+'^ = r2(7$ + 




2 

F 

form solution for W. 


Projected Conjugate Gradient method was used to update W. 


( 6 ) 


( 7 ) 























12 


B. Algorithm 


Algorithm 1 ADMM algorithm for Robust PCA on Graphs 


procedure Robust PCA Graphs(A g ^ e A, 7 ) 
k^O 

^ rand{n,p), ^ rand{n^p), 

Ti ^ 1, r2 ^ 1 

Z^^X-L^ - -L^ 

while & 1 


-Po 




Update 1/^+^ using eq. 5 
Update 5'^+^ using eq. 6 
Update using eq. 7 

Update using eq. 2 

Update ^ 2 ^^ using eq. 3 
Update Pi^+\P 2 ^+^ and P 3 ^+^ 
k ^ k 
end while 
return P^+^, 
end procedure 


using step 6. 


^ rand{n,p) 


Pi ^ 

> e & 


7 tr(P^^P' 


-p^ 


) 

> e 


> inputs 
> iteration index k 

> Initialize primal variables 

> Initialize dual variables 

> Initialize primal objective 


& 


-zf 


II I 


> e & 


-z^ 


> e do 


C. Convergence Analysis & Computational Complexity 

Algorithm 1 is a special case of Alternating Directions method [18], [2]. These methods are a subset of more general class of 
methods known as Augmented Lagrange Multiplier methods. The convergence of these algorithms is well-studied [2], [11], [9]. 
This algorithm has been reported to perform reasonably well on a wide range of problems and small number of iterations are 
enough to achieve a good accuracy [4]. The complexity of nuclear norm proximal computation is 0{np^ for n > p and 
the computational complexity of the Conjugate Gradient method for updating W is 0{np), per iteration. Thus, the dominant 
cost of each iteration corresponds to the computation of nuclear proximal operator. Our future work will be dedicated to reduce 
this cost by utilizing a partial SVD or an approximate SVD, as suggested in [6]. Further improvements can be made by using 
randomized algorithms for SVD [16] and exploiting the parallel processing capabilities [13]. Please refer to Section L for a 
detailed comparison of computation time of this algorithm with other models considered in this work. 
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D. Properties of Various Models & Datasets 


TABLE II: A summary of the datasets used for the evaluation of various models. As NMF & GNMF require non-negative 
data so they are not evaluated for USPS, MFeat and BCI datasets. 


Image / 
non-image 

Data 

Type 

Datasets 

Evaluated 

Models 

Image 

Faces 

CMU PIE (no pose changes) 

all 

ORL (pose changes) 

YALE (facial expressions) 

Objects 

COIL20 (pose changes) 

Digits 

MNIST (rotation) 

USPS 

all except 
NME & GNME 

non¬ 

image 

features 

BCI (Brain Computer Interface) 

MEeat (handwritten numerals) 


A summary of all the models evaluated in this work 




Original Image 


15% 


25 % 


40 % 


Fig. 10: Sample images from CMU PIE dataset corrupted with occlusions (first row) and missing pixels (2nd row). 
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E. Evaluation Scheme and Parameter Selection for all Models 


Input data 


Model to be evaluated 



Ncuts, LE, PCA, 
GLPCA, RGLPCA, 
MMF, NMF, GNMF 


RPCA, 

Robust PCA on Graph 


Repeat for each tuple of 
the parameter grid 

Run the model 10 times 


Cluster over 
Q using k-means 
10 times 


Note the minimum error 
over 10 runs of k-means 


Note the minimum error 
over 10 runs of model 


Note the minimum error 
over whole parameter grid 


Repeat for each tuple of 
the parameter grid 

Run the model once 


Determine Q’ 
via SVD of low-rank 
matrix L 


Cluster over 
Q’ using k-means 
10 times 


Note the minimum error 
over 10 runs of k-means 


Note the minimum error 
over whole parameter grid 


Fig. 11: A flow chart describing the evaluation and parameter selection procedure for each of the models considered in this 
work. Each model has several parameters as described in Table III. To perform a fair evaluation between all the models, they 
are run over the entire parameter range and the minimum error is reported. Further, the non-convex models are run 10 times 
(to determine a good local minimum) for every tuple of the parameter range and the minimum error is reported. The k-means 
clustering procedure for each of the models is also evaluated 10 times to avoid the bias introduced by its non-convexity. 
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F. Adjacency Matrix Construction from Corrupted & Uncorrupted Data 


Input images 








Extend the mask in 
case of occlusion 


Extend the mask in 
case of occlusion 


Determine the Euclidean distance ^^12^23^^13 

TX 


Wmin = niin^i2, t2l3, ^^ 23 ) 


Compute edge weights for Adjacency matrix 

= exp ( - ^ ^ = exp ( - (^^3 ^ 


Fig. 12: A procedure for the construction of Adjacency matrix A from clean and corrupted samples X. We assume that the 
block occlusions can be detected and the mask can be extended to the other image of the pair to remove the effect of occlusion 
in the calculation of Euclidean distance. The case of random missing pixels can be handled by using inpainting techniques, 
such as Total Variation denoising. 
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G. Clustering Results on all Databases 


TABLE IV: A comparison of clustering error of our model with various dimensionality reduction models using the procedure 
of Fig. 11. The image data sets include: 1) ORL 2) CMU PIE 3) YALE and 4) COIL20. The compared models are: 1) k-means 
2) Normalized Cuts (NCuts) 3) Laplacian Eigenmaps (LE) [1] 4) Standard Principal Component Analysis (PCA) 5) Graph 
Laplacian PCA (GLPCA) [7] 6) Robust Graph Laplacian PCA (RGLPCA) [7] 7) Manifold Regularized Matrix Factorization 
(MMF) [19] 8) Non-negative Matrix Factorization [10] 9) Graph Regularized Non-negative Matrix Factorization (GNMF) [3] 
and 10) Robust PCA (RPCA) [4]. Two types of full and partial corruptions were introduced in the data: 1) Block occlusions 
and 2) Random missing values. The best results are highlighted in bold. This table provides the numerical errors for the bar 
plots shown in Fig. 2. 


Data Set 

Model 

No 

Corruptions 

Sample specific corruptions 

Full Corruptions 

Occlusions 
(25% of data) 

Missing values 
(25% of data) 

Occlusions 
(% of image size) 

Missing 

(% of image pixels) 

25% image size 

25% image pixels 

15% 

25% 

40% 

15% 

25% 

35% 

50% 

CMU PIE 1 
(faces) 

k-means 

72.2 

72.5 

73.4 

78.5 

79.4 

81.0 

72.3 

76.1 

75.7 

80.3 

NCuts 

41.7 

36.0 

40.0 

42.0 

40.0 

44.0 

35.0 

41.3 

40.7 

46.7 

LE 

41.7 

40.0 

43.3 

49.3 

54.0 

51.3 

46.3 

46.0 

48.3 

55.0 

PCA 

13.0 

25.7 

34.0 

42.7 

45.0 

61.0 

35.0 

50.0 

56.3 

63.7 

GLPCA 

29.8 

27.7 

38.3 

38.0 

40.2 

41.3 

39.3 

37.5 

42.7 

48.2 

RGLPCA 

28.0 

28.3 

31.7 

35.0 

41.3 

39.0 

34.3 

33.7 

28.7 

42.0 

MME 

53.7 

50.0 

55.0 

54.0 

58.0 

60.7 

59.7 

60.3 

60.3 

62.7 

NME 

49.0 

68.0 

70.3 

76.0 

80.0 

79.3 

49.0 

76.0 

80.0 

79.3 

GNME 

52.7 

66.0 

68.3 

71.3 

75.0 

75.3 

52.7 

71.3 

75.0 

75.3 

RPCA 

3.3 

4.3 

11.0 

6.7 

13.3 

23.3 

9.7 

8.3 

17.0 

37.3 

Our model 

0.0 

4.7 

6.9 

5.0 

7.7 

17.7 

7.7 

7.5 

15.7 

34.0 

ORG 

(faces) 

k-means 

35.4 

57.6 

50.8 

64.3 

68.4 

77.5 

36.0 

42.4 

49.4 

57.1 

NCuts 

47.0 

35.7 

43.0 

42.3 

53.0 

61.0 

49.7 

45.7 

42.7 

46.7 

LE 

45.3 

38.0 

34.7 

39.0 

45.3 

54.3 

42.7 

45.0 

44.0 

42.7 

PCA 

28.0 

46.0 

37.3 

53.7 

65.0 

72.6 

34.3 

39.0 

40.3 

40.0 

GLPCA 

27.6 

28.3 

29.7 

33.6 

34.1 

44.3 

30.3 

27.9 

27.6 

33.3 

RGLPCA 

28.3 

28.7 

29.0 

28.3 

36.3 

37.3 

26.0 

29.3 

26.7 

26.3 

MME 

20.3 

29.7 

24.3 

30.0 

34.7 

38.3 

24.7 

28.3 

28.3 

27.3 

NME 

29.0 

31.7 

27.7 

78.0 

79.0 

81.3 

29.7 

39.3 

50.3 

53.3 

GNME 

22.7 

29.0 

25.3 

35.0 

37.0 

39.3 

25.3 

26.7 

28.0 

26.7 

RPCA 

18.6 

27.7 

20.3 

26.0 

35.0 

71.4 

24.7 

26.0 

27.0 

36.0 

Our model 

15.7 

20.0 

14.3 

23.7 

24.7 

31.7 

20.0 

18.3 

20.3 

21.0 

YALE ^ 
(faces) 

k-means 

53.4 

54.9 

64.5 

71.6 

73.4 

73.1 

54.3 

56.4 

58.9 

63.5 

NCuts 

54.5 

57.6 

64.4 

66.7 

64.2 

68.5 

56.4 

56.9 

61.2 

61.8 

LE 

57.6 

57.0 

61.8 

63.6 

68.5 

69.1 

57.8 

59.4 

60.6 

66.7 

PCA 

53.9 

49.7 

58.8 

61.8 

66.1 

70.9 

55.7 

55.1 

62.4 

61.2 

GLPCA 

49.1 

50.9 

53.9 

54.5 

61.2 

58.2 

50.9 

52.7 

48.5 

56.9 

RGLPCA 

48.5 

50.0 

50.9 

54.5 

55.2 

58.8 

50.3 

53.9 

49.7 

50.9 

MME 

38.8 

37.6 

46.7 

55.7 

55.2 

53.9 

38.2 

38.2 

44.2 

49.1 

NME 

63.0 

63.6 

70.3 

72.1 

71.5 

72.1 

64.8 

61.2 

61.8 

66.7 

GNME 

56.9 

55.8 

57.6 

60.6 

64.8 

60.6 

56.9 

56.9 

58.2 

59.4 

RPCA 

39.4 

40.6 

45.5 

61.8 

67.9 

63.0 

42.4 

39.4 

43.6 

63.6 

Our model 

35.1 

35.8 

40.0 

43.6 

46.1 

50.9 

35.2 

35.8 

39.4 

50.3 

COIL20 4 
(objects) 

k-means 

32.0 

55.2 

30.2 

34.9 

45.8 

57.8 

36.3 

34.8 

37.8 

42.8 

NCuts 

44.5 

30.5 

39.5 

38.5 

38.5 

43.5 

38.0 

50.0 

48.0 

47.5 

LE 

38.0 

31.5 

27.0 

34.5 

36.5 

40.0 

31.0 

37.5 

40.0 

39.0 

PCA 

30.5 

43.5 

28.5 

28.0 

37.0 

50.0 

24.0 

30.0 

33.5 

31.0 

GLPCA 

20.5 

22.0 

17.0 

25.0 

25.5 

25.3 

21.7 

21.5 

21.5 

23.3 

RGLPCA 

18.5 

23.5 

17.5 

20.5 

22.0 

23.5 

22.0 

20.0 

22.0 

21.0 

MME 

18.0 

19.0 

11.5 

19.0 

18.0 

25.0 

19.0 

18.0 

19.0 

18.5 

NME 

20.0 

30.0 

19.0 

32.5 

47.0 

60.5 

17.5 

26.0 

24.0 

26.0 

GNME 

15.5 

18.5 

9.5 

20.0 

19.5 

30.0 

14.5 

13.5 

11.0 

17.0 

RPCA 

19.0 

33.0 

13.5 

19.0 

28.0 

50.5 

15.0 

16.0 

16.5 

27.0 

Our model 

15.5 

18.0 

9.0 

18.0 

17.5 

19.5 

8.5 

12.0 

16.5 

15.0 
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TABLE V: A comparison of clustering error of our model with various dimensionality reduction models. The image data 
sets include: 1) MNIST and 2) USPS. The compared models are: 1) k-means 2) Normalized Cuts (NCuts) [14] 3) Laplacian 
Eigenmaps (LE) [1] 4) Standard Principal Component Analysis (PC A) 5) Graph Laplacian PC A (GLPCA) [7] 6) Robust Graph 
Laplacian PC A (RGLPCA) [7] 7) Manifold Regularized Matrix Eactorization (MME) [19] 8) Non-negative Matrix Eactorization 
[10] 9) Graph Regularized Non-negative Matrix Eactorization (GNME) [3] and 10) Robust PC A (RPC A) [4]. Two types of 
full and partial corruptions were introduced in the data: 1) Block occlusions and 2) Random missing values. The best results 
are highlighted in bold. 


Data Set 

Model 

No 

Corruptions 

Sample specific corruptions 

Full Corruptions 

Occlusions 
(25% of data) 

Missing values 
(25% of data) 

Occlusions 
(% of image size) 

Missing 

(% of image pixels) 

25% image size 

25% image pixels 

15% 

25% 

40% 

15% 

25% 

35% 

50% 

MNIST^ 

(digits) 

k-means 

62.2 

71.2 

59.6 

68.2 

79.2 

82.5 

62.0 

60.5 

60.6 

65.9 

NCuts 

69.3 

86.7 

59.0 

85.3 

87.3 

86.0 

77.7 

78.7 

83.7 

86.7 

LE 

71.0 

88.0 

55.3 

88.0 

89.3 

89.0 

68.7 

55.3 

86.3 

89.3 

PCA 

52.0 

60.0 

51.3 

68.3 

77.0 

77.3 

46.0 

57.3 

61.0 

57.0 

GLPCA 

49.3 

63.7 

44.7 

59.3 

71.7 

79.0 

45.3 

51.7 

56.3 

57.1 

RGLPCA 

46.3 

64.7 

51.0 

55.3 

65.3 

73.7 

52.7 

53.0 

53.7 

44.3 

MMF 

45.7 

65.7 

45.0 

52.7 

65.3 

77.3 

52.7 

44.7 

53.3 

55.7 

NMF 

46.3 

67.3 

33.7 

60.0 

72.3 

72.0 

52.0 

47.7 

49.7 

47.0 

GNMF 

49.7 

72.7 

35.3 

51.7 

71.0 

83.7 

50.7 

42.7 

44.7 

40.7 

RPCA 

39.0 

53.3 

43.0 

64.7 

73.7 

78.0 

44.7 

44.7 

52.7 

70.7 

Our model 

31.7 

46.0 

32.3 

53.0 

62.7 

69.7 

29.7 

33.7 

35.3 

37.7 

USPS 

(digits) 

k-means 

51.0 

45.3 

54.0 

55.4 

66.2 

73.8 

41.2 

41.7 

40.1 

38.9 

NCuts 

54.0 

49.3 

61.3 

68.3 

72.7 

75.7 

53.0 

47.7 

55.7 

55.3 

LE 

64.7 

52.7 

67.7 

74.3 

81.7 

83.0 

65.0 

71.7 

66.0 

63.3 

PCA 

49.7 

42.0 

50.3 

55.0 

66.0 

71.7 

43.3 

36.0 

39.3 

48.0 

GLPCA 

45.8 

26.7 

45.0 

46.9 

56.7 

68.9 

39.7 

38.3 

33.5 

32.5 

RGLPCA 

42.0 

33.3 

46.7 

45.0 

59.0 

63.7 

33.3 

26.7 

35.7 

37.0 

MMF 

33.0 

22.3 

40.7 

37.7 

50.3 

62.6 

21.5 

21.6 

20.7 

20.7 

RPCA 

28.3 

29.0 

46.7 

56.0 

67.7 

69.7 

29.0 

32.3 

32.6 

38.0 

Our model 

21.3 

21.7 

35.3 

42.0 

48.0 

56.0 

21.5 

21.3 

20.5 

27.3 


TABLE VI: A comparison of clustering error of PC A models and simple k-means for MEeat and BCI data sets. Each of the 
data sets was corrupted with two types of outliers: 1) Block occlusions and 2) Missing values. Block occlusions in non-image 
databases correspond to an unrealistic assumption so such experiments were no performed for these datasets. Eurthermore, 
NME and GNME were not evaluated because they require non-negative data whereas these datasets are negative as well. The 
best results are highlighted in bold. 


Data 

Set 

Model 

No 

Corruptions 

Sample specific 
Missing values 
(25% of data) 

25% features per sample 

Full Corruptions 

Missing values 
(% features per sample) 

15% 

25% 

35% 

50% 

MFeat ^ 

k-means 

32.4 

30.8 

25.5 

29.9 

32.2 

33.9 

NCuts 

39.8 

47.0 

26.3 

28.3 

33.5 

37.8 

LE 

38.0 

47.3 

35.3 

33.0 

33.5 

50.8 

PCA 

7.3 

7.3 

10.0 

11.8 

9.0 

13.8 

GLPCA 

6.0 

12.5 

6.0 

7.8 

8.0 

10.0 

RGLPCA 

9.5 

15.0 

15.0 

17.3 

8.3 

14.8 

MMF 

7.3 

6.3 

6.0 

5.3 

7.0 

7.0 

RPCA 

5.0 

4.3 

3.8 

6.3 

9.5 

15.0 

Our model 

3.3 

2.0 

3.5 

4.3 

6.8 

7.0 

BCI ' 

k-means 

47.8 

47.2 

47.8 

47.8 

48.0 

48.2 

NCuts 

47.0 

47.0 

47.3 

47.5 

47.3 

48.3 

LE 

46.5 

49.7 

49.8 

49.8 

49.8 

45.8 

PCA 

45.3 

45.2 

42.0 

43.0 

43.0 

45.5 

GLPCA 

46.5 

46.0 

45.8 

45.8 

46.5 

45.0 

RGLPCA 

45.5 

43.7 

46.5 

44.3 

43.8 

42.8 

MMF 

47.25 

47.2 

47.0 

47.5 

47.3 

48.3 

RPCA 

39.8 

43.0 

40.0 

42.8 

43.5 

43.0 

Our model 

40.3 

37.7 

39.0 

42.3 

42.0 

41.0 
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H. A Comparison of the Principal Directions U Learned with Low-Rank and Principal Components Graph 

1st two singulai 
values 


m 


Fig. 13: A comparison of the principal directions U learned with low-rank graph tr(L^L^) and principal components graph 
regularization tr((5^Q^) for the CMU PIE dataset. Each row shows the 1^^ two principal directions U and the corresponding 
singular values for each of the PCA models evaluated in this work. Eor the factorized models (PCA, GLPCA, RGLPCA and 
MME) the principal directions U are explicitly learned. Eor RPCA and our model, U is obtained by singular value decomposition 
of the low-rank matrix L, i.e. L = U'EQ. The low-rank graph for our model clearly helps in learning a corruption free U. 
Even though the 2^^ principal direction for our model has a large effect of occlusion, we note that the corresponding singular 
value (2^^ bar) is much smaller as compared to the 1^^. Eor all the other models, each of the two principal directions are 
affected by occlusions and the corresponding singular values are significant. This explains why a low-rank graph helps in 
an enhanced low-rank recovery and a lower clustering error in low dimensional space than the principal components 
graph. 
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I. Additional Low-Rank Recovery Results on CMU PIE Dataset 


Low Rank recovery of occluded images of CMU PIE dataset with the best visual results highlighted in red 



Fig. 14: A comparison of the clean low-rank recovered images of the CMU PIE data set corresponding to each of the PC A 
models considered in this work. For this experiment, the images of one person are corrupted by block occlusions occupying 
10% of the image. Each row corresponds to a different image of the same person occluded at a random position. 1^^ figure in 
each row shows the actual occluded image. 2'^^ and figures show the whitened occluded and un-occluded images. Since 
PCA requires whitening, the recovered low-rank images in figures 4 to 8 using GLPCA, RGLPCA, MMF, RPCA and our 
model resemble the un-occluded whitened image. The best recovered representations (via a visual inspection) are highlighted 
with a red box. 
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J. Additional Results for Background Separation from Videos 



(a) Restaurant food counter. The lightning effect (not static) is less visible in right figure than the middle one. 



10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 


(b) Airport hallway. The moving person (not a part of static background) is less visible in the right figure than the middle one. 



(c) Shopping mall lobby. All the moving people are removed in both the middle and right figures. 

Fig. 15: Static background separation from three different videos, a) restaurant food counter, b) airport hallway and c) shopping 
mall lobby. In each of the cases: the leftmost figure shows one actual frame of the video, the middle figure shows the recovered 
static background using Robust PCA [4] with A = 1/ A/max(n,p) and the rightmost figure shows the recovered background 
using our model with A = l/A/max(n,p) and 7 = 10. The effect of graph can be appreciated in the first two cases. In a) the 
changes in the illumination (which are not a part of static background) are more visible in the Robust PCA model than our 
model. In b) the moving person is more obvious in Robust PCA recovered background than our model. For c) the two models 
perform equally well. 


K. Parameter Selection for the Proposed Model 

1) Parameter Selection for Clustering: 

2) Parameter Selection for Low-Rank Recovery: 
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Parameter Selection for Clustering 
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Fig. 16: Variation of clustering error over (A, 7) grid for different size, classes and occlusions in the CMU PIE data set. x-axis 
shows the variation with increasing occlusion size and y-axis with increasing size of the data matrix n and number of classes k. 
The horizontal red lines show a range of A G [1, 2]/-y/max(n,p). The clustering error always attains a minimum value within 
this range of A irrespective of the size of the data matrix, occlusions or the number of classes. Furthermore, a wide range of 
7 values attain a minimum clustering error in the region between the red lines. Thus a simple rule can be used to choose a 
set of good parameters: Fix A G [1,2]/ Y^max(n,p) and then perform a cross-validation over a coarse range ofj values. In 
fact A = 1/yanax(r 7 p) is always a good choice for the images which have no occlusions and are only affected by shadows 
and illumination changes. Each strip on the left of (A, 7) plot shows the variation of clustering error with A (y-axis) for 7 = 0 
(Robust PC A). Clearly, the use of a graph increases the range the possible A values which attain a minimum clustering error. 
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Parameter Selection for Exact Low Rank Recovery 

^ Rank — O.ln, Error — 14%^ 
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Fig. 17: Variation of low-rank normalized reconstruction error over (A, 7) grid for different ranks and errors (corruptions) in 
the artificially generated dataset with Bernoulli support and random sign scheme for sparse errors (Section VIII- A), x-axis 
shows the variation with increasing fraction of errors and y-axis with increasing rank of the data matrix X. The horizontal 
red lines show a range of A G [1, 2]/Y^max(n,p). The clustering error always attains a minimum value within this range of 
A irrespective of the size of the (ran/c, error) setting. Furthermore, a wide range of 7 values attain a minimum clustering 
error in the region between the red lines. Thus the same simple rule as clustering can be used to choose a set of good 
parameters: Fix A G [1, 2]/^max(n,p) and then perform a cross-validation over a coarse range of ^ values. Similar to our 
observation for clustering error, A = l/ 7 /max(n,p) is always a good choice for the datasets with low fraction of errors. 
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L. Comparison of Computation Times 


TABLE VII: A comparison of computational times (in secs) for one run of each model with increasing size of data matrix. 
For this experiment, different size of the CMU PIE dataset (n = 300, 600 and 1200) is corrupted with 20% occlusions and the 
computation time and accuracy is computed for one run of each model. To perform a fair comparison between the models, 
the parameter tuple for each model is chosen using the procedure of Section E. The non-convex models are run 10 times 
and the computational time and accuracy of the run with the minimum clustering error are reported. The convex models are 
run only once. Clearly, RGLPCA has the highest computation time, followed by our proposed model. However, the trade-off 
between the clustering error and computational time is worth observing. Our model takes more time to converge but attains the 
minimum clustering error. This large computation time is dominated by the expensive SVD step in every iteration. Our future 
work will be focused on reducing the complexity of this model by exploiting the distributed and parallel computation 
techniques. 


1 

Model 

n = 300 

n = 600 

n = 1200 

Time (secs) 

Clustering error (%) 

Time (secs) 

Clustering error (%) 

Time (secs) 

Clustering error (%) 

NCuts 

0.24 

47.2 

0.72 

48.3 

1.31 

49.0 

2 

LE 

0.24 

48.1 

0.70 

50.2 

1.24 

52.7 

3 

PCA 

0.11 

34.3 

0.17 

35.9 

0.20 

38.3 

4 

GLPCA 

0.12 

37.5 

0.43 

36.1 

1.60 

37.2 

5 

RGLPCA 

150.4 

34.8 

356.6 

35.3 

1187.6 

37.2 

6 

MMF 

0.13 

36.7 

0.32 

37.5 

1.52 

38.9 

7 

NMF 

0.15 

43.9 

0.62 

45.6 

1.30 

47.4 

8 

GNMF 

1.20 

41.2 

0.62 

40.3 

1.30 

45.9 

9 

RPCA 

59.8 

10.5 

159.3 

12.6 

678.3 

14.9 

10 

Our model 

69.7 

8.4 

169.6 

10.9 

869.8 

13.8 





















